An architecture for tolerating processor failures in shared-memory multiprocessors
نویسندگان
چکیده
منابع مشابه
An Architecture for Tolerating Processor Failures in Shared Memory Multiprocessors
In this paper, we focus on the problem of recovering processor failures in shared memory multiprocessors. We propose an architecture designed for transparently tolerating processor failures. The Recoverable Shared Memory (RSM) is the main component of this architecture which provides a hardware supported backward error recovery mechanism. This technique copes with standard caches and cache cohe...
متن کاملTolerating Processor Failures in a Distributed Shared - Memory Multiprocessor
Scaling transistor geometries and increasing levels of integration lead to rising transientand permanent-fault rates. Future server platforms must combine reliable computation with cost and performance scalability, without sacrificing application portability. Processor reliability—for both transient and permanent faults—represents the most challenging aspect of designing reliable, available ser...
متن کاملTolerating Latency Through Software-Controlled Prefetching in Shared-Memory Multiprocessors
The large latency of memory accesses is a major obstacle in obtaining high processor utilization in large scale shared-memory multiprocessors. Although the provision of coherent caches in many recent machines has alleviated the problem somewhat, cache misses still occur frequently enough that they significantly lower performance. In this paper we evaluate the effectiveness of non-binding softwa...
متن کاملA Novel Lightweight Directory Architecture for Scalable Shared-Memory Multiprocessors
There are two important hurdles that restrict the scalability of directory-based shared-memory multiprocessors: the directory memory overhead and the long L2 miss latencies due to the indirection introduced by the accesses to directory information, usually stored in main memory. This work presents a lightweight directory architecture aimed at facing these two important problems. Our proposal ta...
متن کاملDynamic Data Replication for Tolerating Single Node Failures in Shared Virtual Memory Clusters of Workstations
In this paper we investigate how shared memory clusters can take advantage of replication to tolerate single system failures. We start from a shared virtual memory protocol (GeNIMA) that has been optimized for low-latency, highbandwidth system area networks. We propose a set of extensions that maintain shared data consistent in the presence of failures and support SMP nodes. Our scheme uses dyn...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IEEE Transactions on Computers
سال: 1996
ISSN: 0018-9340
DOI: 10.1109/12.543705